AITopics | teacher-student model

Collaborating Authors

teacher-student model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning curves of generic features maps for realistic datasets with a teacher-student model

Neural Information Processing SystemsDec-24-2025, 13:06:18 GMT

Teacher-student models provide a framework in which the typical-case performance of high-dimensional supervised learning can be described in closed form.

generic feature map, realistic dataset, teacher-student model, (4 more...)

Neural Information Processing Systems

Industry: Education > Educational Technology > Educational Software (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.83)

Add feedback

Optimal Task Order for Continual Learning of Multiple Tasks

Li, Ziyan, Hiratani, Naoki

arXiv.org Machine LearningFeb-5-2025

Continual learning of multiple tasks remains a major challenge for neural networks. Here, we investigate how task order influences continual learning and propose a strategy for optimizing it. Leveraging a linear teacher-student model with latent factors, we derive an analytical expression relating task similarity and ordering to learning performance. Our analysis reveals two principles that hold under a wide parameter range: (1) tasks should be arranged from the least representative to the most typical, and (2) adjacent tasks should be dissimilar. We validate these rules on both synthetic data and real-world image classification datasets (Fashion-MNIST, CIFAR-10, CIFAR-100), demonstrating consistent performance improvements in both multilayer perceptrons and convolutional neural networks. Our work thus presents a generalizable framework for task-order optimization in task-incremental continual learning.

artificial intelligence, continual learning, machine learning, (17 more...)

arXiv.org Machine Learning

2502.0335

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Education > Educational Setting (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Learning curves of generic features maps for realistic datasets with a teacher-student model

Neural Information Processing SystemsJan-17-2025, 20:05:58 GMT

Teacher-student models provide a framework in which the typical-case performance of high-dimensional supervised learning can be described in closed form. In this paper, we introduce a Gaussian covariate generalisation of the model where the teacher and student can act on different spaces, generated with fixed, but generic feature maps. While still solvable in a closed form, this generalization is able to capture the learning curves for a broad range of realistic data sets, thus redeeming the potential of the teacher-student framework. Our contribution is then two-fold: first, we prove a rigorous formula for the asymptotic training loss and generalisation error. Second, we present a number of situations where the learning curve of the model captures the one of a realistic data set learned with kernel regression and classification, with out-of-the-box feature maps such as random projections or scattering transforms, or with pre-learned ones - such as the features learned by training multi-layer neural networks.

generic feature map, realistic dataset, teacher-student model, (1 more...)

Neural Information Processing Systems

Industry: Education > Educational Technology > Educational Software (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Elevating Skeleton-Based Action Recognition with Efficient Multi-Modality Self-Supervision

Wei, Yiping, Peng, Kunyu, Roitberg, Alina, Zhang, Jiaming, Zheng, Junwei, Liu, Ruiping, Chen, Yufan, Yang, Kailun, Stiefelhagen, Rainer

arXiv.org Artificial IntelligenceJan-10-2024

Self-supervised representation learning for human action recognition has developed rapidly in recent years. Most of the existing works are based on skeleton data while using a multi-modality setup. These works overlooked the differences in performance among modalities, which led to the propagation of erroneous knowledge between modalities while only three fundamental modalities, i.e., joints, bones, and motions are used, hence no additional modalities are explored. In this work, we first propose an Implicit Knowledge Exchange Module (IKEM) which alleviates the propagation of erroneous knowledge between low-performance modalities. Then, we further propose three new modalities to enrich the complementary information between modalities. Finally, to maintain efficiency when introducing new modalities, we propose a novel teacher-student framework to distill the knowledge from the secondary modalities into the mandatory modalities considering the relationship constrained by anchors, positives, and negatives, named relational cross-modality knowledge distillation. The experimental results demonstrate the effectiveness of our approach, unlocking the efficient use of skeleton-based multi-modality data. Source code will be made publicly available at https://github.com/desehuileng0o0/IKEM.

knowledge, modality, recognition, (16 more...)

arXiv.org Artificial Intelligence

2309.12009

Country:

Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
Asia > China (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Training-based Model Refinement and Representation Disagreement for Semi-Supervised Object Detection

Marvasti-Zadeh, Seyed Mojtaba, Ray, Nilanjan, Erbilgin, Nadir

arXiv.org Artificial IntelligenceOct-26-2023

Semi-supervised object detection (SSOD) aims to improve the performance and generalization of existing object detectors by utilizing limited labeled data and extensive unlabeled data. Despite many advances, recent SSOD methods are still challenged by inadequate model refinement using the classical exponential moving average (EMA) strategy, the consensus of Teacher-Student models in the latter stages of training (i.e., losing their distinctiveness), and noisy/misleading pseudo-labels. This paper proposes a novel training-based model refinement (TMR) stage and a simple yet effective representation disagreement (RD) strategy to address the limitations of classical EMA and the consensus problem. The TMR stage of Teacher-Student models optimizes the lightweight scaling operation to refine the model's weights and prevent overfitting or forgetting learned patterns from unlabeled data. Meanwhile, the RD strategy helps keep these models diverged to encourage the student model to explore additional patterns in unlabeled data. Our approach can be integrated into established SSOD methods and is empirically validated using two baseline methods, with and without cascade regression, to generate more reliable pseudo-labels. Extensive experiments demonstrate the superior performance of our approach over state-of-the-art SSOD methods. Specifically, the proposed approach outperforms the baseline Unbiased-Teacher-v2 (& Unbiased-Teacher-v1) method by an average mAP margin of 2.23, 2.1, and 3.36 (& 2.07, 1.9, and 3.27) on COCO-standard, COCO-additional, and Pascal VOC datasets, respectively.

cascade regression, detection, teacher-student model, (11 more...)

arXiv.org Artificial Intelligence

2307.13755

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.64)

Industry: Education (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.76)

Add feedback

Topic-driven Distant Supervision Framework for Macro-level Discourse Parsing

Jiang, Feng, He, Longwang, Li, Peifeng, Zhu, Qiaoming, Li, Haizhou

arXiv.org Artificial IntelligenceMay-23-2023

Discourse parsing, the task of analyzing the internal rhetorical structure of texts, is a challenging problem in natural language processing. Despite the recent advances in neural models, the lack of large-scale, high-quality corpora for training remains a major obstacle. Recent studies have attempted to overcome this limitation by using distant supervision, which utilizes results from other NLP tasks (e.g., sentiment polarity, attention matrix, and segmentation probability) to parse discourse trees. However, these methods do not take into account the differences between in-domain and out-of-domain tasks, resulting in lower performance and inability to leverage the high-quality in-domain data for further improvement. To address these issues, we propose a distant supervision framework that leverages the relations between topic structure and rhetorical structure. Specifically, we propose two distantly supervised methods, based on transfer learning and the teacher-student model, that narrow the gap between in-domain and out-of-domain tasks through label mapping and oracle annotation. Experimental results on the MCDTB and RST-DT datasets show that our methods achieve the best performance in both distant-supervised and supervised scenarios.

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.13755

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California (0.14)
Asia > China > Hong Kong (0.05)
(8 more...)

Genre: Research Report (1.00)

Industry: Education (0.52)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.69)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Explaining Neural Scaling Laws

Bahri, Yasaman, Dyer, Ethan, Kaplan, Jared, Lee, Jaehoon, Sharma, Utkarsh

arXiv.org Machine LearningFeb-12-2021

The test loss of well-trained neural networks often follows precise power-law scaling relations with either the size of the training dataset or the number of parameters in the network. We propose a theory that explains and connects these scaling laws. We identify variance-limited and resolution-limited scaling behavior for both dataset and model size, for a total of four scaling regimes. The variance-limited scaling follows simply from the existence of a well-behaved infinite data or infinite width limit, while the resolution-limited regime can be explained by positing that models are effectively resolving a smooth data manifold. In the large width limit, this can be equivalently obtained from the spectrum of certain kernels, and we present evidence that large width and large dataset resolution-limited scaling exponents are related by a duality. We exhibit all four scaling regimes in the controlled setting of large random feature and pretrained models and test the predictions empirically on a range of standard architectures and datasets. We also observe several empirical relationships between datasets and scaling exponents: super-classing image tasks does not change exponents, while changing input distribution (via changing datasets or adding noise) has a strong effect. We further explore the effect of architecture aspect ratio on scaling exponents.

dataset, international conference, neural network, (14 more...)

arXiv.org Machine Learning

2102.06701

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Santa Clara County > Mountain View (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback